Skip to content

Conversation

jdconrad
Copy link
Contributor

@jdconrad jdconrad commented Sep 4, 2025

Backports the following commits to 9.1:

brianseeders and others added 30 commits July 24, 2025 16:36
This reverts commit f0bb49e.

(cherry picked from commit 48cb0ef)
…stic#130279) (elastic#131809)

When a search targets a single shard, the fetch phase is executed in the same roundtrip as the query phase. In that case the search context is reused across phases, and the cancellation runnables registered to the searcher are reused. This is incorrect as the search timeout check should not be applied to the fetch phase, or it will make fetching partial results unfeasible. For instance if the query phase times out, the fetch phase will time out as well and there won't be partial results. If the query phase does not time out, but the fetch phase does, the latter will return a different set of results compared to those expected which will lead to unexpected situations like an array index out of bound exception.

This commit clears the cancellation runnables ahead of running the fetch phase in the same rountrip as query and performs their registration once again, without the timeout checks which are never applied to the fetch phase.

Closes elastic#130071

Co-authored-by: jessepeixoto <[email protected]>
…astic#131886)

This change updates `TransportVersion` to support our new model while still allowing the old model to 
work as well giving us time to migrate.
) (elastic#131892)

Currently in ES|QL if you have a ConstantNullBlock and a DoubleBlock (or
any other standard block type: Boolean, BytesRef, Float, Int, Long) and
your DoubleBlock can be represented as a ConstantNullBlock, then
`doubleBlock.equals(constantNullBlock)` can evaluate as `true` (if
position count is the same).

However, `constantNullBlock.equals(anyDoubleBlock)` is always false.
Likewise, the hashcodes of these two blocks are different, even if
`doubleBlock.equals(constantNullBlock)` returns true.
This PR addresses that by making the hashcodes equivalent and the equals
functions symmetric in returning true.
…#131941)

This PR reduces the logging level of the test logging added in elastic#111360 to Trace. The issue that logging was intended to investigate has been closed, and there doesn't appear to be any current need for this logging. If we need it in the future, it will be trivial to re-enable.
Accidentally added this in when backporting elastic#131954 which will break
some of the tests, so I am removing it
…ic#131924)

Apparently, when calling to_lower or to_upper with no parameters, an NPE was thrown, instead of a proper error. AFAICT, this is an old left-over from when these functions were imported from a much older version.

Resolves elastic#131913.
* Add applies to to ScalB function in elastic#127696

* Add applies_to to categorize, follow up to elastic#129398

* Add version info, following elastic#127629

* SAMPLE is new + GA in 9.1 elastic#127629

* add applies to for 9.2 option

(cherry picked from commit 5d565b5)

# Conflicts:
#	docs/reference/query-languages/esql/_snippets/functions/parameters/categorize.md
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/grouping/Categorize.java
…le-counting allocations. (elastic#131990) (elastic#131995)

This reverts commit 971cfb9.

The refactoring in 971cfb9 introduced a bug that could potentially lead to double-counting of the number of allocations in the trained model memory estimation.
elastic#131767) (elastic#131778)

* Add TEST MergeWithLowDiskSpaceIT testRelocationWhileForceMerging (elastic#131767)

This adds a test that covers relocation for shards that are running a
force merge.

Relates elastic#93503

* Fix MergeWithLowDiskSpaceIT testRelocationWhileForceMerging (elastic#131806)

The index settings are randomized in the test, but this test suite doesn't work when indices have a custom data path.
…) (elastic#132029)

When encoding an ignored source entry, we write the string length of the 
field name, not the encoded byte count; however, the decode logic treats
this encoded value as the byte length. This patch updates the decode logic
to instead properly treat the value as the string length.
…32033)

This adds support for splitting `Page`s of large values when loading
from single segment, non-descending hits. This is hottest code path as
it's how we load data for aggregation. So! We had to make very very very
sure this doesn't slow down the fast path of loading doc values.

Caveat - this only defends against loading large values via the
row-by-row load mechanism that we use for stored fields and _source.
That covers the most common kinds of large values - mostly `text` and
geo fields. If we need to split further on docs values, we'll have to
invent something for them specifically. For now, just row-by-row.

This works by flipping the order in which we load row-by-row and
column-at-a-time values. Previously we loaded all column-at-a-time
values first because that was simpler. Then we loaded all of the
row-by-row values. Now we save the column-at-a-time values and instead
load row-by-row until the `Page`'s estimated size is larger than a "jumbo"
size which defaults to a megabyte.

Once we load enough rows that we estimate the page is "jumbo", we then
stop loading rows. The Page will look like this:

```
| txt1 | int | txt2 | long | double |
|------|-----|------|------|--------|
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        | <-- after loading this row
|      |     |      |      |        |     we crossed to "jumbo" size
|      |     |      |      |        |
|      |     |      |      |        |
|      |     |      |      |        | <-- these rows are entirely empty
|      |     |      |      |        |
|      |     |      |      |        |
```

Then we chop the page to the last row:
```
| txt1 | int | txt2 | long | double |
|------|-----|------|------|--------|
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
| XXXX |     | XXXX |      |        |
```

Then fill in the column-at-a-time columns:
```
| txt1 | int | txt2 | long | double |
|------|-----|------|------|--------|
| XXXX |   1 | XXXX |   11 |    1.0 |
| XXXX |   2 | XXXX |   22 |   -2.0 |
| XXXX |   3 | XXXX |   33 |    1e9 |
| XXXX |   4 | XXXX |   44 |    913 |
| XXXX |   5 | XXXX |   55 | 0.1234 |
| XXXX |   6 | XXXX |   66 | 3.1415 |
```

And then we return *that* `Page`. On the next `Driver` iteration we
start from where we left off.
)

* Restrict remote ENRICH after FORK (elastic#131945)

* Restrict remote LOOKUP JOIN after FORK

(cherry picked from commit 24aefcc)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/PlannerUtils.java

* Not for backport
…32058)

This fixes incomplete entitlements added in elastic#131680 and is necessary due to the lack of entitlement delegation.
…131429) (elastic#132073)

* Prevent auto-sharding for data streams in LOOKUP index mode

* Update docs/changelog/131429.yaml

* Reduce test duplication
)

Previously, entitlement checks got disabled when resetting the policy manager (which defaults to inactive). This change makes sure entitlements are correctly enabled during tests.

Due to the lack of entitlement delegation (and usage of server's FileSystemUtils and similar in test code), there's a few remaining issues:
- various tests have to run without entitlements
- node base dirs cannot be removed immediately when shutting down the node due to pending cleanups (wipePendingDataDirectories)

Due to Netty dependency issues (ES-12435), azure and inference tests have to run without entitlements.
This reverts commit afa7fec.

(cherry picked from commit 54d0e6f)
pquentin and others added 17 commits September 2, 2025 16:38
…3849) (elastic#133864)

This refactors ZERO, MINIMUM_COMPATIBLE, and MINIMUM_CCS_VERSION into TransportVersion.VersionsHolder and load them from /transport/constants/....
…ic#134010)

Today we report having disconnected from a node along with the root
cause, typically a `SocketException`, but for troubleshooting purposes
we may also need to identify the exact TCP channel which was affected by
the exception. This commit adds a `NodeDisconnectedException` wrapper to
add the additional details.
…33846) (elastic#134014)

Part of elastic/kibana#231200 .

We realized that when the mappings are updated in a version upgrade,
the mappings are in fact updated, but not applied to the current
reporting datastream.

To fix this, we'll make a determination in Kibana that we should roll
over the data stream if the version in the template does not match the
version in the mappings of the datastream.  If they match, we don't need
to do anything.  If they don't match, or the version of the mappings is
not in the datastream (from before this PR), we will need to roll over the
data stream to apply the new mappings.

To make this happen, we need to add a new field to the mapping `_meta`,
which should match the template version.

references:

- https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/mapping-meta-field

Co-authored-by: Elastic Machine <[email protected]>
(cherry picked from commit ebb94bd)

# Conflicts:
#	x-pack/plugin/stack/src/main/java/org/elasticsearch/xpack/stack/StackTemplateRegistry.java
With this change it takes into account that not all versions have
previous minor unreleased version (because we are on the oldest active
development branch), or that current version has minor == 0 (e.g. 9.0)
so previous minor have to be calculated differently.

* Add FwC branch configuration and update periodic trigger logic

Ensuring that only relevant branches are considered.

* Correct FWC periodic pipeline variables

Utilize Buildkite matrix syntax and escape env in command.
This PR contains the following updates:

| Package | Type | Update | Change | |---|---|---|---| |
docker.elastic.co/wolfi/chainguard-base | final | digest | `4dbe940` ->
`bb3bb94` | | docker.elastic.co/wolfi/chainguard-base | stage | digest |
`4dbe940` -> `bb3bb94` | | docker.elastic.co/wolfi/chainguard-base-fips
| final | digest | `d9382de` -> `e5602c7` | |
docker.elastic.co/wolfi/chainguard-base-fips | stage | digest |
`d9382de` -> `e5602c7` |

---

### Configuration

📅 **Schedule**: Branch creation - "after 1pm on tuesday" (UTC),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once
you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

👻 **Immortal**: This PR will be recreated if closed unmerged. Get
[config help](https://elastic.slack.com/archives/C07AMD4CNUR) if that's
undesired.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate
Bot](https://redirect.github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xMDcuMCIsInVwZGF0ZWRJblZlciI6IjM5LjEwNy4wIiwidGFyZ2V0QnJhbmNoIjoiOS4xIiwibGFiZWxzIjpbIjpEZWxpdmVyeS9QYWNrYWdpbmciLCI+bm9uLWlzc3VlIiwiVGVhbTpEZWxpdmVyeSIsImF1dG8tbWVyZ2Utd2l0aG91dC1hcHByb3ZhbCJdfQ==-->
…4040)

Split from elastic#133677.
This PR changes these ML forecast tests to use -1 instead of -1s.
The use of units with a -1 time value is not documented as a valid
value and its use should be eliminated.
Once this change has been backported successfully the
compatibility tests in elastic#133677 should pass.
…lastic#134055)

This had to be changed because there is no access to the classes
described as alternatives in the deprecated field.
…ncer (elastic#133919) (elastic#134051)

The static method TrainedModelAssignmentRebalancer.getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference was used to subtract load.getAssignedNativeInferenceMemory() from load.getFreeMemoryExcludingPerNodeOverhead(). However, in NodeLoad.getFreeMemoryExcludingPerNodeOverhead(), native inference memory was already subtracted as part of the getAssignedJobMemoryExcludingPerNodeOverhead() calculation.

This led to double-counting of the native inference memory. Avoiding this double-counting allows us to remove the private method getNodeFreeMemoryExcludingPerNodeOverheadAndNativeInference() entirely.
… EAs (elastic#133708) (elastic#133824)

* [Gradle] Unify resolving pre release java versions like RCs and EAs (elastic#133708)

* Update distro packaging when testing java ea versions

* Unify resolving pre release java versions like RCs and EAs

This reworks how we deal with pre-release java versions for testing.

Passing `-Druntime.java=25-pre` will pick the lastest build that could be either an EA
or and RC version.

Passing explicitly a build number works by running build via
`-Druntime.java=25-pre -Druntime.java.build=36` which as of now would pick a RC build.

This also tweaks the archive packaging in case of a defined pre release version. This is
used downstream when packaging serverless images including ea / rc versions

* Bring back getJavaDetails used when configuring fips

* Adopt ReproduceInfoPrinter

(cherry picked from commit 5e71f1a)

* Fix merge conflicts
…etting (elastic#134099) (elastic#134108)

Ensure green after rollover to avoid unexpected license state flipping.

Resolves: elastic#133455
…33793) (elastic#134111)

This PR focuses on the short term solution which add the logs-sentinel_one.application-* and logs-sentinel_one.application_risk-* indices under the kibana_system role with deletion privileges to prevent a failed deletion error when the index enters the deletion phase for the ILM lifecycle, in upcoming PR. As it ships transform pipeline too hence read, write permissions are also required.

Current behavior:
It shows permission issue while deleting the index.

(cherry picked from commit bfde47a)
…lastic#134084)

This reverts our other bootstrapping of file-based transport versions
to make migration simpler moving forward and adds the initial version
one prior to the last initial version for each of 9.1, 9.0, 8.19, and 8.18.
@jdconrad jdconrad added the :Core/Infra/Transport API Transport client API label Sep 4, 2025
@jdconrad jdconrad requested a review from a team as a code owner September 4, 2025 14:31
@jdconrad jdconrad requested review from a team as code owners September 4, 2025 14:31
@jdconrad jdconrad added backport Team:Core/Infra Meta label for core/infra team auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) v9.1.4 labels Sep 4, 2025
@jdconrad jdconrad closed this Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport :Core/Infra/Transport API Transport client API >refactoring Team:Core/Infra Meta label for core/infra team v9.1.4 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.